Domain Adaptation of a Broadcast News Transcription System for the Portuguese Parliament

نویسندگان

  • Luís Neves
  • Ciro Martins
  • Hugo Meinedo
  • João Paulo da Silva Neto
چکیده

The main goal of this work is the adaptation of a broadcast news transcription system to a new domain, namely, the Portuguese Parliament plenary meetings. This paper describes the different domain adaptation steps that lowered our baseline absolute word error rate from 20.1% to 16.1%. These steps include the vocabulary selection, in order to include specific domain terms, language model adaptation, by interpolation of several different models, and acoustic model adaptation, using an unsupervised confidence based approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Statistical Machine Translation of Broadcast News from Spanish to Portuguese

In this paper we describe the work carried out to develop an automatic system for translation of broadcast news from Spanish to Portuguese. Two challenging topics of speech and language processing were involved: Automatic Speech Recognition (ASR) of the Spanish News and Statistical Machine Translation (SMT) of the results to the Portuguese language. ASR of broadcast news is based on the AUDIMUS...

متن کامل

The L2F Broadcast News Speech Recognition System

Broadcast news play an important role in our lives providing access to news, information and entertainment. The existence of an automatic transcription is an important medium that not only can provide subtitles for inclusion of people with special needs or be an advantage on noisy and populated environments, but also because it enables data search and retrieve capabilities over the multimedia s...

متن کامل

Dynamic language modeling for European Portuguese

This paper reports on the work done on vocabulary and language model daily adaptation for a European Portuguese broadcast news transcription system. The proposed adaptation framework takes into consideration European Portuguese language characteristics, such as its high level of inflection and complex verbal system. A multi-pass speech recognition framework using contemporary written texts avai...

متن کامل

Dynamic Language Modeling for the European Portuguese

Up-to-date language modeling is recognized to be a critical aspect of maintaining the level of performance for a speech recognizer over time for most applications. In particular for applications such as transcription of broadcast news and conversations where the occurrence of new words is very frequent, especially for highly inflected languages like the European Portuguese. An unsupervised adap...

متن کامل

Online Temporal Language Model Adaptation for a Thai Broadcast News Transcription System

This paper investigates the effectiveness of online temporal language model adaptation when applied to a Thai broadcast news transcription task. Our adaptation scheme works as follow: first an initial language model is trained with broadcast news transcription available during the development period. Then the language model is adapted over time with more recent broadcast news transcription and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008